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Abstract 


A  comprehensive  system  for  transformational  grammar  has  been  designed 
and  is  being  implemented  on  the  IBM  360/67  computer.  The  system  deals 
with  the  transformational  model  of  syntax,  along  the  lines  of  Chomsky's 
Aspects  of  the  Theory  of  Syntax.  The  major  innovations  include  a  full 
and  formal  description  of  the  syntax  of  a  transformational  grammar, 
a  directed  random  phrase  structure  generator,  a  lexical  insertion 
algorithm,  and  e  simple  problem-oriented  programming  language  in  which 
the  algorithm  for  application  of  transformations  can  be  expressed.  In 
this  paper  we  present  the  syst  as  a  whole,  first  discussing  the 
philosophy  underlying  the  development  of  the  system,  then  outlining 
.he  system  and  discussing  its  more  important  special  features. 
References  are  given  to  papers  which  consider  particular  aspects  of 


the  system  in  detail. 
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INTRODUCTION 


The  computer  system  for  transformational  grammar  presented  in  this 
paper  is  the  outcome  of  an  attempt  to  write  computer  programs  a3  aids 
to  research  in  transformational  grammar,  in  particular,  as  aids  to 
writing  grammars. 

In  the  course  of  this  work  it  soon  "became  apparent  that  an 
essential  prior  task  was  the  formalization  of  a  general  and  inclusive 
notion  of  transformational  grammar.  The  basic  model  is  that  of  Chomsky's 
Aspects  of  the  Theory  of  Syntax  [3] ;  we  have  extended  this  model  to  fill 
in  the  many  missing  details  and  have  formalized  it  to  make  it  precise. 

The  system  is  implemented  by  a  FORTRAN  program  on  the  IBM  360/6T 
computer.  However,  as  a  formal  statement  of  transformational  grammar, 
it  can  be  considered  independently  of  the  program.  We  have  therefore 
relegated  to  one  section  and  to  occasional  footnotes  all  matters  related 
directly  to  the  program. 

This  paper  may  be  considered  as  both  a  summary  of  and  an  introduction 
to  the  system.  We  have  stressed  the  ways  in  which  the  system  is  new, 
and  have  left,  the  details  for  other  papers,  which  will  be  cited. 

In  developing  the  system  our  primary  examples  have  been  the  MITRE 
grammar  [l81,  the  IBM  Core  Grammar  [13]  and  the  UCIA  work  on  syntax  [17]t/ 
However,  we  have  not  limited  the  system  to  matters  treated  in  these 
examples,  but  have  tried  to  be  comprehensive. 

^The  UCLA  work  has  kindly  been  made  available  to  us  in  its  preliminary 
stages  through  unpublished  working  papers  and  memoranda.  We  wish  also 
to  thank  Barbara  Hall  Partee  of  UCIA  for  numerous  discussions  which 
have  helped  to  clarify  our  ideas  about  transformational  grammar. 
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A  transformational  grammar  may  be  sKetehily  described  as  follows. 

The  components  of  a  transformational  grammar  are  phrase  structure 
rules,  a  lexicon,  and  a  set  of  transformations.  The  process  of  generating 
a  sentence  consists  first  of  the  generation  of  a  base  tree  using  the 
phrase  structure  rules.  Lexical  items  are  then  attached  appropriately 
by  a  lexical  insertion  algorithm.  Finally,  the  base  tree  with  its  lexical 


items  is  mapped  by  application  of  the  transformations  in  some  order  into 
a  surface  tree.  The  terminal  string  of  the  surface  tree  represents  the 
sentence. 

From  the  outset  we  have  felt  that  it  was  essential  to  consider  a 


transformational  grammar  as  a  whole.  A  rule  of  a  grammar  may  behave 
as  intended  in  isolation,  but  in  the  grammar  its  interaction  with  other 
rules  is  crucial.  It  is  precisely  these  interrelations  which  are  most 
difficult  to  control,  and  we  believe  it  is  here  that  a  computer  system 
can  be  most  helpful. 

We  did  not  wish  to  try  to  guess  the  exact  amount  of  power 
required  to  describe  the  syntax  of  natural  language,  nor  to  be  normative 
in  our  approach.  Our  aim  is  to  handle  as  uniformly  and  simply  as  we 
can  the  sorts  of  things  which  do  appear  in  the  current  work  on 
transformational  grammar.  The  formalism  has  been  made  general  enough 
so  that  most  of  the  formal  grammars  and  rules  which  we  have  seen  can  be 
expressed  naturally.  On  the  other  hand,  there  are  some  devices  m  the 
literature  which  appear  to  us  to  be  so  different  in  character  from  the 


rest  of  the  material  as  to  be  unacceptable  in  anything  like  their  present 
form,  and  we  have  not  included  them.^ 


As  an  example  we  might  cite  the  distance  measure  included  in  the  Identity 
Erasure  Transformation  of  [13].  This  appears  to  us  to  be  more  properly 
considered  as  a  linguistic  rule,  which  should  be  expressible,  but  which 
should  not  appear  as  part  of  a  particular  transformation.  Further 
comments  on  linguistic  rules  of  this  type  appear  below. 
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11  is  quite  iiAfc' v  U«l  at  it-aar.  seme  linguists  will  leel  that 
the  generality  of  the  system  is  excessive.  But.  there  is  no  need  for  any 
one  user  to  employ  its  full  power.  Tn  the  metalanguage  of  this  system, 
a  linguist  may  easily  define  hi?.  cWr,  subset,  of  the  syntax;  we  believe 
such  formalization  will  make  it  easier  fur  him  to  adhere  to  his  Conventions. 
Although  we  have  not  done  so,  it,  would  be  possible  to  provide  user- 
orient.od  subroutines  to  verify  that,  the  user's  additional  constraints 
are  not  violated. 

The  traditional  description  of  a  transformational  grammar  can  be 
given  an  alternative  presentation  in  terms  of  basic  concepts,  components , 
and  component  algorithms.  The  basic  concepts  of  a  grammar  are  trees, 
analyses,  restrictions,  and  complex  symbols,  with  their  corresponding 
algorithms.  The  components  are  phrase  structure,  lexicon,  and 
transformations.  The  component  algorithms  are  phrase  structure  genera¬ 
tion,  lexical  insertion  and  control  of  transformations.  Viewing  a 
grammar  in  this  way,  we  are  able  to  see  more  clearly  the  basic  problems 
to  be  treated.  It  is  this  breakdown  which  will  be  used  in  the  subsequent 
description. 

We  assume  that  the  reader  is  familiar  with  transformational  grammar. 
The  presentation  is  incomplete;  we  omit  standard  items  and  emphasize  the 
ways  in  which  this  system  differs  from  ethers.  While  t> ;  discussion 
below  is  largely  informal,  it  is  important  that  it  is  based  on  the 
completely  formal  syntax  of  [21). 
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A  METALANGUAGE  KOH  TKANr.fX'KMAl  I'.NAL  GRAMMAR 


To  describe  the  syntax  of  a  transfer  mat  icnai  grammar  one  must 
first  chocst  a  metalanguage.  Tie-  <?w« :  choice  by  linguis4..;  has  been 
English.  The  metalanguage  used  lieiv  is  a  modification  ■■!  Buckus  Naur 
Form  (6MF),  familiar  to  computer  scientist  u  as  the  language  used 
in  the  description  «*f  Algol  60.  As  wr  will  use  the  symtu-is  j  , 

<  and  >  in  transformational  grammars,,  we  modify  the  usual  BNF  by 
replacing  angular  brackets  by  underlining,  e.g.  "transformation " 
rather  than  "<transformation>",  ana  using  "or1  in  place  ol  "|"  , 

For  linguists  unfamiliar  with  BNf  it  should  suffice  to  say  4he4. 

(l)  the  modified-BNF  production  !  A  -  B  C  or  L  or  E  " 

(B  C 
D 
E 

(2)  the  nonterminal  symbols  oi'  mcdiflea-BNF  are  denotea  by  the 
underlined  name  of  the  construct,,  viz.  4 runst'ormational  grammar  = 
phrase  structure  lexicon  traps  formations  i'f> )  symbols  not 
underlined  are  used  autonymously,  and  \b)  juxtaposition  in  tlie 
object  language  is  indicated  by  jux+.ap"5 1  lien  ir;  the  metalanguage. 

We  refer  to  the  constructs  of  the  metalanguage  as  "format, s" , 
because  they  are  in  fact  the  i'ree-f in  1  d  formats  of  r.ho  computer  system. 
We  have  carried  the  underlining  of  format,  names  1  n ■  the  text-  of  the 
paper. 

Basil  t.o  the  syntax  are  tb-  tv,  :■  .mats  woru  and  i. . tfg.tr .  A 
word  is  a  contiguous  string  of  ie+iers  and  digits  beginning  with  a 
letter;  integer  is  a  contiguous  string  i  f  digits.  Except  in  these  tv,, 
formats,  spaces  may  be  used  freely. 


If  «  BhT  description  is  to  elucidate  a  language,  it  should  not. 
introduce  names  for  intermediate  formats  which  do  not  have  meaning. 

In  order  to  avoid  additional  formats  whore  possible,  and  to  simplify 
the  description*  we  have  introduced  into  the  metalanguage  the  five 
operators  list,  diet,  opt,  bon'.eancombination  and  choi  restructure. 
In  each  case  the  operand  is  given  within  square  brackets  following  the 
operator.  Only  the  first  three  of  these  operators  are  used  in  this 
paper.  They  are: 

1.  list 

a  list  [  integer  j 

allows  a  to  be 

1  2  6  9171  3  20 

2.  clist  (comma  list) 

a  clist  [  integer  ] 

allows  a  to  be 

1,  2,  6,  9171,  3,  20 

3.  opt  (option) 

a  : opt  [  integer  ]  word 
allows  a  to  be  either 
3  NP  or  NP 

It  is  clear  that  any  occurrence  of  an  operator  in  a  production 
could  be  deleted  by  the  introduction  of  intermediate  formats  and 
corresponding  additional  productions.  This  would  not  change  the  object 
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language. 


A  full  description  of  the  syntax  o t  •  ransformational  grammar  is 
given  in  ( 21 ] .  In  this  paper  we  shall  give  only  a  few  of  the  prey1  actions, 
as  needed  to  describe  special  features  oj  the  system. 


(> 


nanvf’  I'rnfi'Tw 

J.  V'  UUllbui  J.kJ 

Each  of  Hie  basic  concepts  is  used  throughout  a  grammar;  they 
arc  defined  recursively  in  terms  of  one  another. 

Tree 


in  the  list.  Then  the  tree  following  the  word  is  substituted  for  that 
occurrence  of  the  word.  The  process  is  repeated  until  the  list  is 
exhausted.  For  example,  the  tree  specification  S  <  SI  S2  >  , 

SI  NP  <  N  >  ,  S2  VP  <  X  >  ,  X  V  results  in  the  same  tree  shewn 
above. 


In  this  and  other  similar  substitutions  for  a  word,  it  is  intended 


that  the  word  have  exactly  one  occurrence  in  the  tree. 
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Occasionally  a  tabular  representation  ■->£  a  tree  is  preferable, 
and  one  is  available  in  the  system.  It  is  used  for  inputs  to  the 
random  generation  routine,  and  as  the  output  format. 

For  a  detailed  discussion  of  internal  and  external  formats 
for  trees  used  in  the  system  see  (26]. 

Tree  operations 

The  basic  operations  for  trees  are  comparisons  and  changes. 

The  basic  tree  comparison  is  equality.  The  test  for  equality  of  trees 
can  be  combined  with  a  test  for  either  equality  or  nondistinctness  of 
their  corresponding  complex  symbols  (see  below).  Trees  may  also  be 
tested  to  see  if  they  include  a  specified  n<-ae  (dominance). 

Changes  to  trees  include  the  elementary  operations  of  the 
MITRE  grammar  and  the  IBM  Core  grammar.  They  also  include  the  operation 
(  tree  )  SUBST  word  which  substitutes  the  tree  for  an  occurrence  of 

word.  This  can  be  used  to  allow  a  change  to  refer  to  a  node  inserted 

/ 

by  a  previous  change  in  the  same  se-1 


V 


The  MITRE  programs  [53  and  Londe  and  Schoene  [10]  handle  this 
problem  in  other  ways. 


same 
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Analysis 


Analyses  occur  in  two  places  in  the  grammar:  in  the  structural 
description  for  a  transformation  and  as  contextual  features. 

The  syntax  for  an  analysis  is  a  strong  generalization  of  the 
notion  of  proper  analysis  originally  given  by  Chomsky.  A  proper 
analysis  is  given  by  a  list  of  nodes  which  are  to  occur  in  a  left  to 
right  cut  across  a  tree.  The  syntax  ot'  an  analysis  here  is  fully 
recursive;  the  terms  of  the  analysis  are  not  simply  nodes  but  structures 
which  may  contain  further  analyses. 

analysis  list  f  opt  [  integer  ]  term  ] 

Note  that  this  labelling  of  terms  of  an  analysis  allows  the  linguist 
to  number  only  those  terms  so  which  he  will  refer. 

term  : :  =  structure  or  skip  or  (  choice  ) 
choice  : :=  clist  f  analysis  ] 

Any  member  of  the  clist  will  satisfy  the  choice. 

structure  : :  =  element  opt  [  complex  symbol  ] 

opt  (  opt  l  -t  J  opt  I  /  ]  <  analysis  >  ] 

A  structure  is  an  element  which  may  optionally  have  a  complex  symbol 
and  may  optionally  have  a  further  analysis.  The  analysis  of  the 
element  may  be  negative  ("not  analyzebie  as",  denoted  by  — i  ).  The 
optional  slash  indicates  that  the  analysis  is  not  necessarily  immediate. 
Its  absence  indicates  an  immediate  analysis. 
element,  :  :=  node  or  *  or 

An  element  may  be  a  specific  node  (see  definition  above)  or  simply  an 
unspecified  single  word  indicated  by  the  definite  node  *  .  The 
underline  symbol  occurs  only  ir.  analyse*  which  are  contextual  features, 
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and  indicates  the  location  for  lexical,  insertion.  A  complex  symbol 
in  an  analysis  always  directly  follows  an  e lement. 

skip  :  :=  #  opt  [  <  strucv-re  >  1 

The  use  of  skips  rather  than  variables  follows  the  K1TKE  grammar. 

It  may  be  noted  that  a  tree  is  simply  a  subcase  of  structure 
in  which  no  integers  and  none  of  the  special  symnolc  (  ,  )  ,  -i  ,  /  , 
*  ,  and _ occur. 


iG 


Restriction 


A  restriction  may  occur  only  in  association  with  an  analysis. 

It  may  be  a  proper  part  of  a  transformation,  cr  may  be  part  of  a 
contextual  feature  cr  it  may  define  too  test  for  a  conditional  change 
in  the  structure  change  of  a  transformation. 


;  ***«*»■**> 


Analysis  algorithm 


The  analysis  algorithm  will  be  described  in  detail  in  [24],  The 
one  linguistic  rule  so  far  incorporated  in  the  system  occurs  here.  A 
search  is  not  allowed  to  go  below  a  sentence  symbol  unless  either  the 
analysis  is  part  of  a  transformation  which  has  the  parameter  which 
specifically  allows  this,  or  the  analysis  Itself  contains  a  sentence 
symbol  for  which  a  further  analysis  is  given.  Thi-s  there  are  two  ways 
to  specify  the  depth  of  a  search. 

Another  interesting  feature  of  the  analysis  algorithm  is  the 
provision  for  handling  the  associated  restriculon.  A  three-valued 
logic  is  used  and  the  value  of  the  restriction  is  ''undefined"  until 
the  search  has  proceeded  far  enough  to  determine  a  value  of  "true" 
or  "false"  for  the  whole  restriction.  As  the  search  proceeds  or 
backtracks  the  value  of  the  restriction  is  continually  set  and  unset. 


12 


c< 


ex 


ompl 


symbols  occur 


iti  trees , 


in  analyses  and  restrlcti ons , 


in 


the  structural  change  of  a  transformation .  and  in  the  lexical  entries 


and  the  redundancy  rules  of  the  lexicon. 

We  distinguish  between  a  feature  specification  and  a  feature : 
feature  specification  : value  feature 
Feature  specifications  occur  only  in  complex  symbols. 

A  complex  symbol  is  a  list,  of  feature  specifications  enclosed  in 
vertical  bars  and  is  interpreted  as  a  conjunction.  A  lexical  entry 
contains  a  list  of  complex  symbols  whioh  is  interpreted  as  a  disjunction. 

Only  the  three  values  +  ,  -  and  *  are  allowed.^  Following 
UCLA  [17)  a  feature  specification  with  the  indefinite  value  *  means 
that  the  feature  is  "marked",  without  specifying  whether  it  is 
+  or  -  .  The  value  *  never  appears  in  a  complex  symbol  in  a  tree, 
and  is  never  used  with  a  contextual  feature. 

A  contextual  feature  is  an  analysis  structure  which  contains 

precisely  one  underline  symbol  _  and  whose  head  element  is  a  node. 

It  optionally  has  an  associated  restriction.  The  underline  indicates 
the  node  where  the  lexical  insertion  will  occur.  A  user  who  adheres 
to  Chomsky's  "principle  of  strict  local  subcategorization"  will  use 
as  the  head  element  of  each  contextual  feature  the  node  which  immediately 
dominates  the  one  for  which  the  lexical  insertion  is  to  be  made.  A  user 
who  disavows  the  principle  may  choose  any  dominating  nod»  for  the  head 
element.  Contextual  features  appear  only  in  the  lexicon  ana  are  used 
solely  in  the  lexical  insertion  process. 


IZ 


Gross  [6] 


allows  arbitrary  words 


u.  bf-  dec Lared  as  values. 
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HI 


I 

1 
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5 
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Complex  symbol  operations 


The  basic  operations  for  complex  symtn.^s  are  comparisons  ana 
changes! 

The  comparisons  are  for  equal ity,  non-distinctness ,  and  two 
types  of  inclusion.  The  result  of  t.n*  comparison  of  two  feature 
specifications  A  and  B  is  shown  in  the  tables  below,  where  T 
represents  true  and  F  represents  false  and  abs  indicates  thaT  the 
feature  is  absent  altogether.  Fct  the  test  to  be  true  for  complex 
symbols  it  must  be  true  for  ail  their  feature  specifications. 

EQUALITY  NONEISIINCTNESS  INCLUSION- 1  INCLUSION-2 


The  basic  changes  of  complex  symtous  include.  merging  A  into  B 
moving  the  features  of  A  to  B  ,  erasing  a.i.l  the  features  cf  A  from  B  , 
ar.d  savin';  in  B  only  the  feature  spec  j  f  i.  at  ns  which  are  in*  lulled- 1 
in  A  .  The  results  of  these  operations  are  shown  m  the  tables  below. 

It  is  tc  be  expected  that  other  operations  will  be  added  later  as 


required. 


components 


The  tiuee  components  of  a  t; ■anst'TFigV.  <i.a J  grammar  ere 
phrase  structure  ,  lexicon ,  ais-d  I -ih.nslcrmatl'jr.s. 

Fnrase  struct,!, rc 


Tlie  phrase  shr>.ft>tre  f  ii.e  sy;  t*> m  is  *  conventional  context  - 
free  grammar,  Complex  symbols  do  tr  '  appear  m  the  phrase  structure 
they  are  introduced  during  lexical  inset*  ;<  n  f see  below).  Rules  are 
accepted  in  a  linearization  .■!  'he  s'arriaro  linguistic  form  and  are 
immediately  expanded.-^  For  examp  .t  ,  rule 


r 


AUX 


VP  -  / 


f  MV  { ;,F  :■  >  " 


is  represent ea  as 

VP  -  ( Al  fX  ( MV(  HP  )^CvT’  (liT  ,/>?'  ; )  .o  )  All /  )  ' 

i?he  expression  oi  rule  schemas  ly  use  Kjuem-  sia 

been  Ln-.ludta.^ 


has  not 


•'"■111  a  1 1  {.!j  also  expands  l'ro::,  a  ...  mpa.  !'■  rt:.. 
^  tundc  [ .1 : J )  accepts  the  K:<,;nr  star. 


Lexicon 


A  lexicon  contains  a  preliminary  part,  or  prelexicon,  which 
contains  feature  definitions  and  redundancy  rules.  The  feature 
definitions  include  a  list  of  cabegorys  in  bhe  order  of  lexical  insertion. 
One  may  also  give  names  to  contextual  features  to  avoid  having  to  write 
them  in  full  in  the  lexical  entries.  A  redundancy  rule  is  of  the  form: 

redundancy  rule  ::=  complex  symbol  =  >  complex  symbol 
The  interpretation  is  that  if  a  complex  symbol  includes  all  the 
feature  specif icat ions  of  the  complex  symbol  bo  the  left  of  the 
arrow  (  =  >  )  of  a  redundancy  rule  then  it  implicitly  contains  those 
of  the  complex  symbol  to  the  right  of  the  arrow.  Explicit  expansion 
of  complex  symbols  by  the  redundancy  rules  can  be  carried  out  in  the 
system. 

In  a  lexical  entry  the  set  of  possible  complex  symbols  for  a 
vocabulary  word  are  given.  If  soveraii.  vocabulary  words  have  the  identical 
set  of  complex  symbols,  the  vocabulary  words  appear  in  a  single  lexical 
entry.  Each  complex  symbol  corresponds  to  a  sense  of  the  word.  The  set 
of  complex  symbols  is  regarded  as  a  disjunction.  Since  the  complex  symbol 
itself  is  a  conjunction  of  feature  specifications  this  is  in  effect  a 
normal  form.  Thub  the  system  has  the  same  power  as  one  which  allows 
arbitrary  boolean  combinations  of  features,  (see  Lakoff  [7])>  without 
their  complexity.  For  example,  to  say  that  a  verb  must  have  both  an 
animate  subject  and  an  inanimate  object,  one  may  use  either  one  or  two 
feature  specifications  in  the  same  complex  symbol.  To  say  that  it  must 
nave  either  an  animate  subject  or  inanimate  object,  two  complex  symbols 
are  needed. 
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Transformations 

The  final  component  of  a  grammar  consists  vt  a  list  of  tranGformaf  ions 
and  a  control  program.  The  uiflcustilun  of  t hi  control,  grogram  will  be 
deferred  i.u  the  section  on  the  algorithm  fur  control  of  transformation  . 

A  transformation  consists  of  a  traris format. ion  inrnti  t  i cation , 
a  structural  description ,  and  (optionally)  rcstri ctions  and  structural 
change.  The  transformation  Identification  may  include,  in  addition  i.o 
the  transformation  name,  a  group  number  ar.d  various  parameters.  A 
t.ransf ormation  may  be  references  r  Ly  „  •’•ai,..'  rmatj  .aur  or  by 

the  group  number.  The  parameters  miiratt  whether  or  not  the  transforma¬ 
tion  is  optional,  whether  (and  how)  it  is  U:  be  repeated  after  a 
successful  application,  and  whether  or  not  the  analysis  algorithm  may 
march  below  an  unmentioned  sentence  symbol.  Keywords  are  also  given 
here. 

The  structural  change  is  expressed,  as  in  the  MURE  grammar  (18), 
by  a  list  of  operations.  A  new  feature  of  the  system  is  the 
conditional  change. 

conditional  change  IK  <  res*  ri  --ti  i.n  >  fHEU 

<•  structural  change  >  E1UE 
<  structural  change  > 

The  basic  operations  for  trees  and  c>xnp*ex  symbol s  have  already  beer. 

-A 


discussed. 


COMPONENT  AIOORITHMS 


The  three  main  algorithms  of  a  transformational  grammar  correspond 
to  the  three  components  and  are  phrase  structure  generation,  .lexical 
insertion  and  controJ  of  transforms  *  ions.  Our  implementation  of  the 
first  process  is  designed  to  be  useful  in  the  testing  of  r  grammar. 

The  second  has  not  previously  been  fully  described  and  we  give  for  the 
first  time  an  explicit  algorithm.  Various  proposals  have  been  made 
for  the  third  algorithm;  rather  than  choosing  one  of  them  we  include  the 
specification  of  the  algorithm  as  part  of  the  grammar. 

Phrase  structure  generation 

The  system  can  ue  started  with  a  base  tree  input  by  the  user. 
However,  it  also  has  the  capability  of  "directed  random"  generation  of 
trees  from  the  phrase  structure  grammar.  This  scheme,  vhich  is  described 
in  detail  in  [20],  allows  the  user  to  specify  a  "skeleton"  around  which 
a  tree  is  generated  at  random.  The  skeleton  may  also  bear  constraints 
of  dominance,  nondominance  and  equality.  The  scheme  was  designed  to 
make  it  possible  for  the  user  to  generate  trees  which  are  "interesting" 
rather  than  simply  random;  in  particular,  which  will  test  a  specific 
transformation.  It  should  u  noted  them  there  is  a  restriction  on  the 
phrase  structure  grammars  which  can  be  handled  by  the  algorithm: 
the  rules  must  be  ordered  so  that  no  symbol  is  introduced  beicw  the 
rule  which  '•xpanas  it,  with  the  exception  v^f  course  of  the  sentence 
symbol . 
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Lexical  insertion 


The  algorithm  for  lexical  insertion  is  an  interpretation  of  one 
of  the  two  alternatives  presented  by  Chomsky  in  Aspects.  Complex 
symbols  are  introduced  from  the  lexicon  only  after  the  phrase 
structure  generation  of  the  base  tree  is  completed.  In  order  to 
formalize  the  process,  we  have  had  to  make  decisions  on  many  points 
not  treated  explicitly  by  Chomsky.  The  details  are  presented  in  [22]; 
we  note  here  some  of  the  salient  features. 

A  contextual  feature  is  simply  a  special  case  of  analysis ;  thus 
much  of  the  work  in  lexical  insertion  is  done  by  the  same  analysis 
algorithm  used  for  transformations. 

Lexical  insertion  begins  with  the  lowest  embedded  sentence,  and 
works  upward.-^  Within  a  sentence  the  order  of  lexical  insertion  is 
determined  by  the  list  of  categorys  in  the  prelexicon.  This  order  may 
have  considerable  effect  on  the  efficiency  of  the  process.  However, 
from  a  formal  point  of  view,  all  categories  are  alike. 

The  basic  criterion  for  lexical  insertion  is  non-distinctness: 
the  tree  may  already  contain  e.  complex  symbol;  a  word  and  its  complex 
symbol  can  be  inserted  only  .if  the  complex  symbol  is  non-distinct  from 
the  one  already  in  the  tree.  But  this  is  only  a  necessary  condition; 
each  featuie  specification  for  a  contextual  feature  must  be  checked  by 
the  analysis  algorithm.  If  the  value  is  +  the  analysis  algorithm 
must  succeed,  and  if  -  it  must  fail. 

^Although  complex  symbols  are  not  introduced  in  the  phrase  structure, 
it  is  possible  that  a  skeleton  in’  ut  to  the  phrase  structure  generation 
routine  already  contains  some  words  of  the  lexicon.  In  this  case, 
the  complex  symbols  for  those  words  are  looked  up  ir.  the  lexicon  and 
inserted  prior  to  the  process  described  here. 


Once  a  vocabulary  word  and  complex  symbol  have  been  selected  (at 
random  from  those  meeting  the  above  tests),  one  additional  3tep  is 
necessary  before  lexical  insertion  takes  place.  The  possible  side 
effects  of  the  contextual  features  must  be  taken  care  of.  If,  for 
example,  a  verb  has  been  selected  which  takes  animate  subject  and 
inanimate  object,  feature  specifications  may  need  to  be  added  to  the 
complex  symbols  for  the  subject  and  object.  Then  contextual  features 
are  dropped  from  the  complex  symbol,  since  they  have  served  their 
function,  a  +  or  -  vulue  replaces  the  indefinite  value  *  ,  and 
the  vocabulary  word  and  complex  symbol  go  into  the  tree. 


21 


Control  of  transformations 


Each  transformational  grammar  that,  has  discussed  at  all  the  matter 
of  order  and  point  of  application  of  transformations  has  presented  a 
slightly  different  algorithm.  From  the  available  examples,  it  was 
possible  to  abstract  the  basic  ideas  involved  and  to  write  a  simple 
programming  language  in  which  the  linguist  can  express  the  algorithm 
for  a  particular  grammar.^  The  control  program  refers  to  tr an sf ormations 
either  individually  by  transformation  name  or  by  grouu  number.  The 
language  contains  a  repeat-instruction  which  allows  a  list  of  control 


instructions  to  be  repeated  either  for  a  fixed  number  of  times  or  until 
they  all  /ail.  One  innovation  is  the  IN -instruct ion.  The  statement 
IN  transformation  name  (  integer  )  DO 
causes  the  integer-th  term  of  the  transformation  to  be  used  as  the 
starting  point  for  the  search  algorithm.  Such  notions  as  "highest 

sentence",  "lowest  sentence",  etc.  can  be  expressed  by  the  IN  construct. 

2/ 

The  notion  of  keyword  has  also  been  implemented.—' 

The  control  language  allows  brancning  on  the  success  or  failure 
of  a  transformation.  The  use  of  this  conditional  instruction  makes  it 
possible  to  write  transformations  with  Less  attention  to  certain  types 
of  interaction.  For  example,  suppose  transformation  T2  is  to  apply 
only  if  T1  has  failed  mo  apply.  Then  the  instructions 


In  addition  to  controlling  the  grammar,  the  control  language  also 
provides  TRACE  instructions  which  govern  the  amount  of  output. 


Keywords  were  first  used  in  the  MITRE  programs  [5).  They  were 
implemented  in  a  slightly  different  form  by  LEM  L 93 • 
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A; 


IF  T1  THE!’  00  TO  A  EbSE  GO  TO  B, 

T2, 

B: 

will  cause  T2  to  be  bypassed  if  TL  fails.  This  instruction  may 
be  considered  excessively  powerful.  It  is  available  because  the 
alternatives  frequently  seem  to  be  either  to  alter  artificially  the 
structural  description  of  T2  or  to  include  a  restriction  on  T2 
such  as:  "applies  only  if  T1  has  failed  to  apply". ^ 

For  a  detailed  discussion  of  the  control  language  and  examples 
of  control  programs  see  [23]. 

We  have  not  attempted  to  deal  with  the  notion  of  implicit  ordering 
of  transformations. 


The  use  of  the  conditional  instruction  wiil  of  course  speed  up  the 
processing  of  a  tree. 
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THF  PROGRAM 


The  system  is  written  as  a  collection  of’  subroutines  which  can 
be  called  in  various  orders.  A  table  of  the  subroutine  structure  is 
included  in  the  Programmer's  and  User's  Guide  to  the  System  (21!. 

A  MAIN  program  consists  of  a  sequence  of  subroutine  calls. 
Typically  a  run  begins  with  a  call  +0  the  initialization  subroutine* 
followed  by  calls  to  input  routines  for  the  components  of  the  grammar. 
Then  either  a  base  tree  is  input,  or  a  skeleton  is  input  and  the 
generation  routine  called.  Lexical  insertion  is  optional  at  this 
point.  Tnen  the  transformation  routine  is  called,  and  the  program 
executes  tne  user's  control  program.  The  process  can  be  repeated  with 
a  new  tree  from  the  skeleton  or  with  a  new  tree  input. 

Alternative  MAIN  programs  to  test  individual  components  of  the 
grammar  can  easily  be  constructed.  For  examp xe,  to  test  the  phrase 
structure  one  might  simply  generate  trees  at  random.  Or*  to  test 
lexical  insertion  one  could  start  with  base  trees  containing  incomplete 
complex  symbols  and  investigate  how  they  were  completed.  Transforma¬ 
tions  can  be  tested  beginning  from  base  trees  with  (or  without) 
lexical  items  already  included. 

MAIN  programs  for  a  variety  of  purposes  are.  also  given  in  (2^1. 

The  system  is  implemented  in  FORTRAN  TV  (H)  on  the  IBM  360/67. 

To  the  user,  however,  the  system  does  not  rvu  like  FORTRAN.  Ail  of 
the  formats  are  free-fieid  and*  exter nail.y ,  w-uds  may  be  up  to  1*0 
characters  long.  See  [19j  for  a  description  cf  the  free-field 
input/output  subroutine  package. 


DIRECTIONS  FOR  FUTURE  WORK 


There  are  many  ways  in  which  the  worn  whi 'h  has  been  done  can  be 
extended.  Somt  of  these  correspond  interesting  open  questions  in 
the  transformational  theory  of  syntax.  W?  mention  here  some  areas  in 
which  we  plan  to  begin  work.  soon.  We  think  that  the  generality  of  the 
system  will  give  us  a  strong  starring  point  in  these  investigations. 

Conjunction 

No  means  of  handling  t.vans forma Monai  schemas  such  as  conjunction 
has  been  provided.  In  ■'.he  earlier  programs  at  MITRE  a  conjunction 
algorithm  due  to  Schane  [l6J  was  included  and  we  pier,  to  carry  this 
over  into  tne  present,  system  as  its  first  version  of  conjunction.  We 
hope  then  to  investigate  the  alternatives  considered  in  the  literature. 

Idioms 

A  common  proposal  for  the  treatment  of  idioms  is  that  an  idiom 
occurs  as  a  tree  in  the.  lexicon.  We  foresee  only  minor  difficulties 
in  incorporating  idioms  in  this  way,  and  plan  to  do  so  when  time  allows. 

linguistic  rules 

The  current  trend  in  transformational  linguistics  includes  a 
search  for  linguist. ic  rules  which  would  app.y  to  all.  grammars. 

Fcss  [i>,  151,  in  particular,  has  been  working  along  these  lines.  We 
hope  Later  to  invest. igate  this  work  by  devising  means  of  incorporating 


proposed  rules  into  the  system, — ^ 

Lexical  derivation 

The  recent-  work  by  Chapin  |2j  ana  Chomsky  1 1]  on  .lexical 
derivation  has  opened  up  come  interesting  lints  of  investigation 
which  we  are  now  beginning  to  explore  within  the  system.  A  preliminary 
study  of  Chapin  s  early  work  was  maue  prior  to  the  development  of  the 
system  and  is  reported  in  I 30I . 

Dependency  grammars 

Jane  Robinson  [ 1 2 j  has  recently  offered  a  proposal  for  transfor¬ 
mational  grammars  in  which  the  underlying  structure  is  a  dependency 
grammar.  The  present  system  allows  complex  symbols  to  be  associated 
with  any  node  of  a  tree,  but  we  do  not.  new  associate  lexical  words 
with  higher  nodes  as  would  be  required  by  the  "project! vity"  of 
dependency  grammars. 
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Ross:s  rule  of  tree-pruning  has  been  incorporated  by  Gross  (6]. 
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OTHER  TRANSFORMATIONAL  GRAMMAR  SYSTEMS 
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The  earliest  computer  systems  for  transformational  grammar  were 
those  of  Petriek  [11]  and  MITRE  [l8].  The  system  here  is  en  outgrowth 
and  extension  of  this  early  work  at  MITRE.  Naturally  it  embodies 
a  more  recent  version  of  transformational  theory. 

The  partial  system  of  Lieberman  and  Blair  [8,  l]  represents  an 
early  attempt  to  deal  with  the  model  of  Aspects.  A  lexicon  was  defined, 
and  phrase  structure  programs  and  some  transformational  programs  wore 
written. 

Systems  developed  concurrently  with  this  one  include  the  console- 
controlled  grammar  testers  of  Gross  [ 6 J  and  of  Londe  and  Schoene  [10].^ 
The  problems  best  treated  by  a  system  designed  for  immediate  response 
to  a  user  at  a  console  differ  from  those  appropriate  to  an  off-line 
system  such  as  ours.  While  there  is  some  overlap  in  these  systems, 
we  believe  ours  is  the  first  to  ccnsider  all  phases  of  transformational 
grammar  in  a  unified  system.  For  example,  the  three  component  algorithms 
have  no  correspondents  in  other  systems  and  neither  has  included  a 
lexicon.  Various  differences  in  common  areas  have  been  noted  above. 


i/we  wish  to  thank  both  Dave  Londe  and  Lou  Gross  for  many  pleasant 
and  fruitful  discussions,  and  for  a  free  exchange  of  ideas  from 


which  our  work  has  benefitted. 
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